1,889 research outputs found

    About Metrics for Clone Detection

    Get PDF
    Clone detectors rely on the concept of similarity and dissimilarity measures to identify cloned fragments. The choice of specific distance function in a clone detector is arbitrary up to some extent. However, with a deeper knowledge of similarity measures, we can condition this choice to have some properties that can help improve scalability and quality of tools. This paper presents some interesting results, insights and questions about similarity and dissimilarity measures, including a somehow counter-intuitive result on the cosine distance

    06301 Abstracts Collection -- Duplication, Redundancy, and Similarity in Software

    Get PDF
    From 23.07.06 to 26.07.06, the Dagstuhl Seminar 06301 ``Duplication, Redundancy, and Similarity in Software\u27\u27 was held in the International Conference and Research Center (IBFI), Schloss Dagstuhl. During the seminar, several participants presented their current research, and ongoing work and open problems were discussed. Abstracts of the presentations given during the seminar as well as abstracts of seminar results and ideas are put together in this paper. The first section describes the seminar topics and goals in general. Links to extended abstracts or full papers are provided, if available

    Comparison and Evaluation of Clone Detection Tools

    Get PDF
    Many techniques for detecting duplicated source code (software clones) have been proposed in the past. However, it is not yet clear how these techniques compare in terms of recall and precision as well as space and time requirements. This paper presents an experiment that evaluates six clone detectors based on eight large C and Java programs (altogether almost 850 KLOC). Their clone candidates were evaluated by one of the authors as an independent third party. The selected techniques cover the whole spectrum of the state-of-the-art in clone detection. The techniques work on text, lexical and syntactic information, software metrics, and program dependency graphs

    Levenshtein edit distance-based type III clone detection using metric trees

    Get PDF
    This paper presents an original technique for clone detection with metric trees using Levenshtein distance as the metric defined between two code fragments. This approach achieves a faster empirical performance. The resulting clones may be found with varying thresholds allowing type 3 clone detection. Experimental results of metric trees performance as well as clone detection statistics on an open source system are presented and give promising perspectives

    Detection of redundant clone relations based on clone subsumption

    Get PDF
    Clone detection has been presented in the literature at different levels of fragment granularity from functions, to syntactic blocks, to variable length strings of source code or tokens. String matching approaches, prefix and suffix trees, metrics, syntactic approaches and others can be used to compare fragments for similarity. Inclusion relations between source code lines may cause some clone relations to be redundant, when clones code fragments subsume each other. This may occur between nested blocks of source code, for example. An original method to analyze this kind of redundancy in clone relations is presented. The proposed method is based on efficiently combining clone subsumption information together with clone similarity relations on code fragments. The amount of redundancy in clone relations has been evaluated on two open source Java systems, Tomcat and Eclipse. Experimental results are presented. Execution time performance of redundancy analysis is measured and reported. Results are discussed together with further proposed research

    Insider threat resistant SQL-injection prevention in PHP

    Get PDF
    Web sites are either static sites, programs, or databases. Very often they are a mixture of these three aspects integrating relational databases as a back-end. Web sites require configuration and programming attention to assure security, confidentiality, and trustiness of the published information. SQL-injection attacks rely on some weak validation of textual input used to build database queries. Maliciously crafted input may threaten the confidentiality and the security policies of Web sites relying on a database to store and retrieve information. Furthermore, insiders may introduce malicious code in a Web application, code that, when triggered by some specific input, for example, would violate security policies. This paper presents an original approach that combines static analysis, dynamic analysis, and code reengineering to automatically protect applications written in PHP from both malicious input (outsider threats) and malicious code (insider threats) that carry SQLinjection attacks. The paper also reports preliminary results about experiments performed on an old SQL-injection prone version of phpBB (version 2.0.0, 37193 LOC of PHP version 4.2.2 code). Results show that our approach successfully improved phpBB-2.0.0 resistance to SQLinjection attacks

    Mapping features to source code in dynamically configured avionics software

    Get PDF
    Mapping software features to the code that implements them is an important activity for program comprehension and software reengineering. In this paper, we present a novel automated approach to locate features in source code based on static analysis and model checking. This approach focuses on dynamically configured software in which the activation of specific features is controlled by configuration variables. The main advantages of a static approach to feature location are its affordability and applicability to large systems containing hundreds of features. Our methodology is applied to an industrial Flight Management System from the avionics industry. Results show that a static approach to feature mapping is feasible and can locate complex features whose implementation is spread across multiple files and functions

    How to Certify Machine Learning Based Safety-critical Systems? A Systematic Literature Review

    Full text link
    Context: Machine Learning (ML) has been at the heart of many innovations over the past years. However, including it in so-called 'safety-critical' systems such as automotive or aeronautic has proven to be very challenging, since the shift in paradigm that ML brings completely changes traditional certification approaches. Objective: This paper aims to elucidate challenges related to the certification of ML-based safety-critical systems, as well as the solutions that are proposed in the literature to tackle them, answering the question 'How to Certify Machine Learning Based Safety-critical Systems?'. Method: We conduct a Systematic Literature Review (SLR) of research papers published between 2015 to 2020, covering topics related to the certification of ML systems. In total, we identified 217 papers covering topics considered to be the main pillars of ML certification: Robustness, Uncertainty, Explainability, Verification, Safe Reinforcement Learning, and Direct Certification. We analyzed the main trends and problems of each sub-field and provided summaries of the papers extracted. Results: The SLR results highlighted the enthusiasm of the community for this subject, as well as the lack of diversity in terms of datasets and type of models. It also emphasized the need to further develop connections between academia and industries to deepen the domain study. Finally, it also illustrated the necessity to build connections between the above mention main pillars that are for now mainly studied separately. Conclusion: We highlighted current efforts deployed to enable the certification of ML based software systems, and discuss some future research directions.Comment: 60 pages (92 pages with references and complements), submitted to a journal (Automated Software Engineering). Changes: Emphasizing difference traditional software engineering / ML approach. Adding Related Works, Threats to Validity and Complementary Materials. Adding a table listing papers reference for each section/subsection

    Predicting Web site access: an application of time series

    Full text link
    • …
    corecore